An Advanced Partitioning Approach of Web Page Clustering utilizing Content & Link Structure
نویسندگان
چکیده
Clustering of non-homogenous documents has become an increasing challenge and opportunity with the huge proliferation of World Wide Web. It has become difficult to retrieve the desired information without proper clustering of Web-page with the increase in information on the WWW. Several new ideas have been proposed in recent years. Among them partitioning approach is still widely used clustering approach for its simplicity. This paper proposes a partitioning approach to cluster the Web-page based on information provided by the hyperlink structure of Web-pages and also by the content of the Web-pages. The proposed approach of Web-page clustering exhibits better result than K-medoid partitioning clustering approach as the centroids are chosen by HITS Algorithm. The partitioning approach like Kmediod, K-means require number of clusters apriori. It has been observed that the performance of these approaches depend on the initial selection centroids of the clusters. These two problems have been solved by the approach proposed in this paper. Experimental result supports our approach as better concept.
منابع مشابه
Data Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملA Survey Paper of Structure Mining Technique using Clustering and Ranking Algorithm
A survey of various link analysis and clustering algorithms such as Page Rank, Hyperlink-Induced Topic Search, Weighted Page Rank based on Visit of Links K-Means, Fuzzy K-Means. Ranking algorithms illustrated, Weighted Page Rank is more efficient than Hyperlink-induced Topic Search Whereas clustering algorithms has described Fuzzy Soft, Rough K-Means is a mixture of Rough K-Means and fuzzy soft...
متن کاملEnhancing Navigability in Websites Built Using Web Content Management Systems
Websites built using Web Content Management Systems (WCMSs) usually provide their users with three alternative access structures to surf their contents: indexes of categories, breadcrumb trails, and sitemaps. In addition, to find contents of his/her interest, a user can perform more or less advanced full-text searches. In this paper we propose an automatic approach to extend the navigation stru...
متن کاملEnhancing Contents-Link Coupled Web Page Clustering and Its Evaluation
Web page clustering is a fundamental technique to offer a solution for data management, information locating and its interpretation of Web data and to facilitate users for navigation, discrimination and understanding. Most existing clustering algorithms cannot adapt well to Web clustering directly in terms of efficiency and effectiveness. Combining contents analysis and hyperlink structure anal...
متن کاملA General Approach for Partitioning Web Page Content Based on Geometric and Style Information
In this paper, we describe a general-purpose approach for partitioning Web page content. Our approach uses no ontologies or domain-specific information. The novelty of our approach lies in: the use of visual layout information rather than the DOM tree to determine spatial locality; the use of relaxed matching over presentation style information to determine presentation style similarity; and th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCIT
دوره 4 شماره
صفحات -
تاریخ انتشار 2009